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Abstract 

Problem:  Modern  higher  level  languages,  for  example,  ALGOL  68,  must 
be  defined  to  allow  many  alternate  token  representations  to  accommodate  the 
variety  of  input/ output  equipment  available.   Because  of  this  variety,  it 
may  not  be  possible  to  transport  programs  from  one  hardware  representation 
to  another  by  any  simple  process  of  substitution. 

Solution:  As  an  example  of  the  proposed  approach,  this  paper  defines 
a  six-bit  encoding  for  ALGOL  68  program  texts.   It  is  an  encoded  character 
prefix  representation,  so  it  can  be  decoded  without  lookahead.   The  encoding 
is  defined  in  detail  and  a  decoder  for  a  specific  hardware  representation 
is  given. 
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The  Problem:   Transporting  Program  Texts 

One  of  the  advantages  originally  claimed  for  higher  level  languages 
was  machine- independence;  the  "same"  program  would  run  on  many  different 
machines.   Experience  has  revealed  that  such  languages  are  even  more  valuable 
as  programming  tools  because  they  enhance  communication  between  programmer, 
debugger,  and  modifier.   Nonethless,  there  is  still  considerable  need  for 
machine  independence  and  the  transportation  of  programs  between  installations. 
For  example,  reference  (k)   defines  a  subset  of  FORTRAN  that  is  accepted  by 
most  compilers;  references  (3)  and  (2)  define  ALGOL  68  representations  that 
require  only  a  reasonable,  widely  available  set  of  graphics.   (The 
transportation  representation  proposed  below  in  no  way  depends  on  the 
hardware  representation  suggestions  in  (2).   Those  suggestions  were  made 
only  to  precisely  define  a  vehicle  for  the  decoder  in  Appendix  B. ) 

In  many  cases,  as  J.  McCarthy  (7)  has  pointed  out,  the  concrete 
syntax  of  a  language  is  less  critical  than  the  abstract  syntax  -  the  inherent 
expressive  features  of  the  language.  At  least  one  recent  language,  Revised 
ALGOL  68  (l,  hereinafter  called  the  "Report"),  has  reflected  this  lesson; 
the  defining  document  specifies  the  abstract  syntax  with  great  precision, 
but  leaves  many  of  the  details  of  token  representation  to  the  discretion 
of  implementors.   This  permits  the  freedom  necessary  to  implement  the 
language  on  a  wide  variety  of  input/ output  devices,  but  raises  a  new  problem: 
token  representations  may  be  so  incompatible  that  programs  cannot  be  trans- 
ported by  any  process  of  simple  substitution. 

This  paper  will  first  explore  the  dimensions  of  the  problem  and  then 
describe  a  solution  for  ALGOL  68  in  the  form  of  a  "transportation  representation" 

text  encoding  for  interchange  of  programs.  Without  this  representation, 

2 
program  sharing  among  n  implementations  would  require  n  transliteration 

programs;  with  it,  each  implementation  need  have  only  an  encoder  and  a 


decoder  for  a  total  of  2n  transliterators.   (One  can  conceive  of  a  compiler 
with  numerous  pragmat  selected  front  ends  to  accept  many  hardware  representa- 
tions, but  the  multiplicity  of  the  latter  seems  to  doom  the  former. )  A 
further  advantage  of  the  transportation  representation  will  be  the  possibility 
of  providing  a  decoder  to  translate  to  a  publication  language. 

Simple  substitution  may  not  suffice  to  transport  program  texts  for 
several  reasons.   Sometimes  one  symbol  is  used  for  several  functions. 
For  example,  in  ALGOL  68  the  symbol  "  "  must  be  transliterated  to  either 
entier  or  lwb  depending  on  the  context.  More  serious  problems  are 
posed  by  "stropping, "  the  techniques  used  to  distinguish  boldface  text  from 
roman  text.   Some  implementations  demand  apostrophes  around  boldface  (whence 
the  name  stropping);  others  require  backspacing  and  underlining;  still  others 
reduce  clutter  by  specifying  that  boldface  words  are  restricted  and  can  never 
be  used  with  a  non-bold  meaning.  Many  transformations  between  stropping 
conventions  require  complex  analysis,  including  possibly  tables  of  reserved 
words.  A  third  problem  occurs  if  the  text  contains  lines  longer  than  those 
accepted  by  the  target  compiler.   Efforts  to  preserve  original  indentation 
are  warranted  because  of  its  value  in  reading  and  modifying  the  text. 
Unfortunately,  in  contexts  like  bold  words  and  strings  it  is  not  possible  to 
freely  intersperse  spaces  and  new  lines.   Proper  treatment  of  overlength 
lines  is,  in  fact,  a  very  context  dependent  problem. 


I.  What  criteria  must  a  transportation  representation  satisfy? 

0  It  must  encode  as  much  information  from  the  original  as  possible. 
Where  ALGOL  68  provides  a  choice  between  brief  and  bold  forms,  the 
programmer's  choice  should  be  recorded.  When  the  encoding  of  '(b|c|d)' 

is  decoded  it  should  not  become  'if  b  then  c  else  d  fi ' .   Original  spacing 
should  be  recorded  so  some  semblance  can  be  reproduced,  depending  on  the 
line  length  of  the  decoding  media. 

1  Text  "close"  to  ALGOL  68  ought  to  be  accepted  as  well  as  perfect 
programs.   Programs  should  not  be  transported  with  syntax  errors,  but 
someone  will  probably  want  to  do  it.  Moreover,  this  provision  will  make 
it  possible  to  transport  programs  written  in  super-sets  of  ALGOL  68. 

°  To  be  transportable  to  the  maximum  number  of  locations,  the 
representation  should  be  encoded  in  a  small  byte  size.   Six  bits  (6^  unique 
values)  can  be  represented  conveniently  on  the  transput  media  for  almost 
all  machines. 

°  The  representation  design  should  emphasize  simplicity  of  decoding, 
Encoders  can  be  based  on  a  compiler's  existing  token  scanner. 

°  The  representation  should  not  expand  the  size  of  the  text  by  any 
significant  factor.   The  proposed  representation  collapses  multiple  blanks 
to  a  two  syllable  code,  so  its  output  is  significantly  more  compact  than, 
say,  cards.   For  magnetic  tape,  this  factor  would  be  less  important. 


II.   What  ALGOL  68  problems  must  a  transportation  representation  face? 

o  Represent  two  type  faces,  roman  and  bold,  and  two  cases,  upper 
and  lower. 

°  Contend  with  unusual  graphics  specified  in  the  Report;  consider 
'  []  ',  '  |o  ',  '°',  not  to  mention  the  proposal  in  (2)  for  '-C*.   Contend 
with  graphics  not  mentioned  in  the  Report  but  available  at  particular 
installations  and  used  as  "other  monads"  or  "other  string  items." 

°  Deal  with  string  and  pragment  tokens  which  can  contain  diverse 
graphics  and  even  non-graphics. 

o  Solve  the  ambiguities  inherent  in  multiple  symbols  for  a  single 
operator.  For  ease  of  decoding,  '['  should  be  encoded  not  as  just  itself 
but  as  to  whether  it  represents  itself,  lwb,  or  entjer. 

°  Consider  the  problems  of  implementations  that  allow,  for  instance, 
different  representations  of  ' : '  in  its  contexts  as,  say,  label-symbol  and 
routine -symbol. 

o  Devote  some  attention  to  the  existence  of  national  variants  of 
ALGOL  68.   Should  the  transportation  representation  provide  for  several 
alphabets?  What  should  be  done  about  national  variants  of  the  bold-tags, 
the  tags  defined  in  the  standard  prelude,  and  the  letters  in  real  and  bits 
denotations  and  in  format  texts. 
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III.   Structure  of  the  proposed  representation. 

Three  approaches  to  a  transportation  representation  seem  possible: 
(l)  a  standard  representation  in,  perhaps,  ASCII,  (2)  an  encoded  token 
level  transliteration,  (3)  a  syntax  oriented  representation. 

The  first  would  seem  a  chimera  since  it  would  be  difficult  to  get 
general  agreement  and  because  it  could  not  encode  all  the  token  structure 
and  unusual  graphics  that  might  be  available  in  some  implementations.  It 
would  have  the  advantage  of  being  both  human  readable  and  machine  readable. 

The  third  is  very  attractive  from  a  number  of  viewpoints.   Only  the 
abstract  syntax  would  be  recorded  so  a  decoder  could  easily  choose  whatever 
label-symbol  or  routine-symbol  was  applicable.   Unfortunately,  a  syntactic 
representation  takes  more  storage  (see  the  Appendices  in  (6)),  is  complex 
to  encode  and  decode,  and  implies  some  standard,  agreed  upon  grammar. 

Based  on  these  considerations,  I  chose  the  second  option,  a  token 
level  prefix  code  that  can  be  decoded  without  look-ahead.   The  encoded 
form  of  the  text  is  a  sequence  of  operator,  letter,  and  digit  syllables, 
represented  as  six  bit  quantities.   These  syllables  are  specified  in  detail 
in  Appendix  A. 

Syllables  in  the  code  can  change  the  case  and  the  type  face  of 
succeeding  letters.   Special  ALGOL  68  symbols  and  operators  defined  in  the 
standard  prelude  are  represented  by  a  prefix  syllable  (#7)  followed  by  a 
syllable  to  select  the  symbol  (quote-symbol  is  thus  #7>  #2;  't  '  as  a 
left-shift-operator  is  #7,  #35) •   The  decoder  is  simplified  by  assignment 
of  different  codes  to  each  of  the  possible  uses  of  't ',  '~',  and  '['. 
Provision  is  made  for  6  "national  letters, "  6k   other  nonstandard  letters, 
and  ^l6o  "other  monads."  To  reduce  the  size  of  encoded  text,  multiple 


blanks  are  represented  with  just  two  syllables,  and  there  is  provision  to 
assign  any  syllable  to  invoke  a  sequence  of  syllables. 

Unlike  special  character  symbols,  there  is  no  implied  list  of 
bold-tags  or  standard  prelude  tags;  these  are  simply  spelled  out  using 
appropriate  type  face  changes.   Consequently,  superset  languages  may  use 
this  transportation  representation  even  if  they  have  additional  standard 
bold-tags. 

IV.  What  special  problems  may  face  implementations? 
IV. 1.   Token  separation 

Trivial  but  annoying  problems  are  posed  to  encoders  and  decoders 
by  the  need  to  determine  transitions  between  roman  tags,  bold  tags,  and 
numbers.   These  problems  are  made  a  little  more  difficult  if  the  hardware 
representation  of  the  implementation  uses  reserved  word  stropping,  but  they 
are  hardly  insurmountable.   To  avoid  expansion  of  the  text,  no  codes  are 
transmitted  to  specify  token  types.  A  decoder  must  keep  track  of  the  current 
token  type  to  provide  for  proper  separation.   In  fact,  this  requirement  is 
the  reason  for  inclusion  in  the  decoder  below  of  the  non-trivial  routine, 
CHMGE_STATE. 

Encoders  in  general  may  omit  face  shifts  if  tags  are  separated 
by  spaces  or  special  characters.   Thus  only  one  face  shift  at  the  beginning 
would  be  needed  for  either 

abs  long  1  +  abs  0 
or 

x  +  sin(y)/z2  +  7. 


The  spaces  separate  the  hold  tags  and  the  special  characters  separate  the 
roman  tags.   The  encoder  may  need  to  insert  separation  in  several  cases: 

a)  Stropped  bold  tags  separated  only  by 
strop  characters:   'ref'real'. 

The  encoder  may  insert  spaces  for  strop  characters,  but  preferably  it  will 
only  mark  the  separation  by  inserting  a  bold  face  shift. 

b)  Stropped  bold  tag  followed  by  a  number. 
Again,  a  face  shift  may  be  inserted. 

c)  Number  followed  by  a  tag. 

This  is  perfectly  legal  and  no  separation  would  be  needed. 

Some  implementations  will  mark  spaces  in  tags  in  some  way. 
But,  because  tags  and  non-tag  letters  may  appear  in  format,  it  is  not 
always  trivial  to  determine  when  a  space  is  within  a  tag.   For  this 
reason,  encoders  must  specially  mark  such  spaces  (with  the  token  space 
code,  #2).   Encoders  for  reserved  word  hardware  representations  can  best 
be  built  as  an  adjunct  of  the  compiler  and  use  the  compiler's  routines 
for  distinguishing  bold  and  roman. 

Decoders  can  easily  distinguish  bold  from  roman,  but  must  properly 
strop  the  bold.   The  three  cases  discussed  for  encoders  must  be  considered 
for  the  decoder,  too.  Where  a  bold  tag  is  separated  from  a  succeeding  bold 
tag  or  number  by  no  more  than  a  face  shift  code,  the  decoder  must  insert 
stropping  characters  or  spaces  to  separate  them.   If  prefix  stropping  is 
used,  a  bold  tag  immediately  after  a  number  must  be  detected  so  the 
stropping  can  be  inserted. 


8 


To  avoid  confusion  of  roman  and  bold,  reserved  word  implementations 
must  check  every  roman  tag  for  equality  to  a  bold  tag  and  modify  the  roman 
in  some  way--perhaps  by  the  addition  of  an  "x". 

IV. 2.   Mult i -symbol  representations 

A  few  implementations  may  choose  distinct  representations  for 
distinct  symbols  that  are  assigned  the  same  representation  by  the  Report 
(e.g.,  ':'  suffices  for  four  distinct  symbols  in  the  Report  but  some 
implementation  may  use  another  representation  for  one  of  the  four — perhaps 

isdefinedtoimplyresumptionatthispoint 

for  label-symbol) .   The  encoder  for  such  an  implementation  will  map  the 
distinct  representations  into  the  single  standard  code  provided  by  the 
transportation  representation  (#26  represents  ':')•   The  decoder  must  solve 
the  more  context  dependent  problem  of  distinguishing  among  the  various 
meanings  of  the  standard  code.   This  decision  to  represent  only  the  basic 
graphics  in  the  transportation  representation  means  that  the  majority  of 
encoders  will  be  simplified  at  the  expense  of  a  small  number  of  decoders. 

IV. 3 •   Treatment  of  nonstandard  graphics. 

-  The  proposed  transportation  representation  provides  for  both 
'national  letter'  and  'other  monad'  special  characters.   These  characters  may 
appear  in  five  different  contexts:   letters  appear  in  tags  and  bold  tags, 
monads  in  TAO's,  and  both  appear  in  strings  and  pragments.  An  encoder  for 
an  implementation  that  allows  special  characters  must  assign  one  of  the 
appropriate  codes  to  each  character  and  output  that  code  for  each  occurrence 


of  the  character.  Accompanying  the  encoded  text  should  be  documentation 
specifying  the  characters  the  encoder  has  associated  with  each  special  code. 
Decoders  must  react  appropriately  to  each  type  of  special  character 
in  each  of  the  five  contexts.   Default  actions  for  each  case  are  specified 
in  Appendix  B.   In  earlier  versions  the  decoder  accepted  run- time  input 
for  (l)  specification  of  transliterations  for  all  special  marks; 
(2)  specification  of  notes — where  the  special  character  occurs,  a  pointer 
appears  under  the  transliteration  and  the  text  of  the  note  appears  in  the 
righthand  margin.  Another  option  a  decoder  might  provide  would  be  a 
specification  of  whether  a  TAO  containing  a  special  character  should  be 
transliterated.   The  user  might  not  want  this  to  occur  if  he  was  able  to 
specify  a  monad  to  replace  the  special  character  in  TAO's. 

IV. k.      Encoders  and  decoders  for  implementations  of  variants. 

Like  the  Report  itself,  this  proposal  is  slanted  toward  the  English 
alphabet  and  English  identifiers  in  the  standard  prelude.   Considerable 
thought  was  devoted  to  allowing  a  kind  of  alphabet  shift --similar  to  a  case 
shift — to  specify  that  the  text  was  constructed  from  characters  in  some 
other  alphabet.   (Possibly  even  providing  a  representation  for  letter-aleph! ) 
With  this  scheme,  however,  every  decoder  would  have  had  large  tables  of 
transliterations  and  considerable  code  that  would  in  practice  have  been 
rarely  used.   Instead,  this  extra  effort  should  be  expended  only  by 
implementers  for  variant  languages.   They  will  transliterate  into  English 
to  transport  programs  with  the  representation  proposed  here. 
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Several  classes  of  token  must  be  translated  to  English:   standard 
bold  tags,  tags  in  the  standard  prelude,  and  letters  in  denotations  and 
format  texts.   Other  letters  can  be  strictly  transliterated  in  such  a  way 
that  when  the  text  is  decoded  back  into  its  original  language,  it  will  be 
the  same.   The  text  will  probably  be  meaningless  English,  but  it  could  be 
compiled  with  an  English  speaking  compiler.   If  used  in  this  way,  incidentally, 
there  is  no  reason  why  the  transportation  representation  cannot  serve  equally 
well  to  transport  between  two  variant  implementations. 

IV. 5 •   Spaces  and  visible  spaces 

Some  ALGOL  68  implementations  will  not  provide  a  representation  for 
visible-space.   They  must  still  encode  spaces  within  strings  to  the  standard 
representation  for  visible-space.  Moreover,  on  decoding  they  must  ignore 
ordinary  spaces  in  strings  and  convert  visible-spaces  to  spaces. 

IV. 6.  What  to  convert  to  standard  codes 

Certain  symbols  are  represented  by  both  standard  graphics  and  bold 
tags;  the  question  arises  as  to  when  to  encode  a  bold  tag  as  the  corresponding 
standard  symbol.   For  example,  nil  in  a  program  may  be  encoded  as  either  nil 
or  standard  code  $62.   If  the  latter  option  were  chosen,  when  it  was  decoded 
it  might  appear  as  '  °',  depending  on  the  character  set  available. 

In  general  it  is  preferable  to  encode  a  bold  tag  as  a  bold  tag 
so  the  decoded  text  is  as  close  as  possible  to  the  original.  Appendix  A. 2 
reflects  this  by  enclosing  bold  tag  representations  in  parentheses  in  the 
fourth  column.   This  signifies  that  any  encoder  I  write  will  not  output 
the  standard  code  on  the  left  (using  the  bold  tag  instead),  but  the 
decoder  will  respond  to  the  standard  code  by  producing  the  given  bold  tag. 
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A  few  graphics  are  well  defined  and  widely  available  but  also  have 
bold  alternatives.   These  I  would  encode  as  the  standard  code:  NE,  LT,  LE, 
GE,  GT,  EQ,  NOT,  AND,  OR,  OF,  CO.   In  addition,  I  would  encode  an  'e'  in 
a  real  denotation  as  the  code  for  times -ten-to-the -power.   To  keep  the 
encoder  simple  I  would  convert  style-i-sub(or  bus) -symbol  to  open (or  close )- 
symbol  rather  than  to  brief-sub(or  bus) -symbol. 

IV.7  Puns 

A  'pun'  occurs  when  two  different  sequences  of  codes  map  into  the 
same  ALGOL  68  text.   For  example,  the  decoder  in  Appendix  B  can  generate 
puns  six  ways : 

1)  Appending  "x"  to  tags  that  would  otherwise  be  mistaken  for 
bold  reserved  words.   (The  tags  "real"  and  "realx"  both  map  into  'realx". ) 
The  generation  of  the  "x"  is  an  attempt  to  prevent  the  more  serious  pun 
of  converting  an  input  roman  tag  (which  must  have  been  stropped  in  the 
original  text)  into  a  bold  tag. 

2)  Transliteration  of  special  letters  and  monads.   ("Zy"  and 
code  32  both  map  into  "Zy".   The  bold  tag  QO  and  the  first  "other  monad" 
both  map  to  "QO'". ) 

3)  Transliteration  for  illegal  standard  code  operands.   (ERRlS 
and  standard  code  18  both  map  to  "ERR18'".) 

k)      Transliterations  for  legal  standard  codes.   (ELEM  and  standard 
code  31+  (window)  both  map  to  "ELEM'".) 

5)  Representation  of  two  different  nomads — asterisk  and  times — with 


12 

6)  Representation  of  standard  codes  with  diphthongs.   ("+»"  (as  two 
standard  codes)  and  standard  code  ill  (plus-i-times)  both  map  to  "+*".) 

Puns  will  seldom  occur  in  practice,  but  for  this  reason  they  are 
all  the  more  dangerous — their  possible  existence  may  be  forgotten.  When 
they  do  occur,  they  will  usually  generate  syntax  errors  during  a  subsequent 
compilation.   If  not,  the  worst  has  occurred,  and  lengthy  frustration  may 
ensue.   To  guard  against  this,  the  decoder  must  maintain  a  symbol  table  and 
take  steps  to  correct  all  possible  puns. 

V.   Conclusion 

It  should  be  noted  that  this  proposal  is  only  a  small  step  toward 
machine  independence.   The  far  more  crucial  (and  difficult)  problems  of 
arithmetic  precision  and  bits-  and  bytes-widths  still  remain.   Perhaps  these 
can  be  remedied  by  some  informal  agreement  that  so  many  long ' s  specify  at 
least  so  much  precision.   The  compiler  would  then  give  a  warning  message 
if  it  could  not  satisfy  a  precision  request.   It  has  been  suggested  that 
mode  definitions  can  provide  machine  independence;  however,  denotations 
must  have  their  length  specified,  and  there  is  no  coercion  of  length. 

Implementors '  efforts  to  provide  encoders  and  decoders  for  the 
transportation  representation  will  permit  wide  interchange  of  ALGOL  68 
programs.   This  is  one  more  step  toward  machine  independence  and  freedom 
from  the  dictates  of  computer  manufacturers. 
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Appendix  A.   Specification  of  the  ALGOL  68  Transportation  Representation 

An  ALGOL  68  program  is  represented  in  the  Transportation  Representation 
as  a  sequence  of  six-bit  syllables.   Some  syllables  represent  characters 
directly  (#25  represents  the  character  '9'),  some  sequences  represent 
characters  (#7,  #33  is  '& ' ),  and  some  syllables  only  affect  the  state  of 
the  decoder  (#11  switches  to  boldface).   To  facilitate  debugging,  the  meaning 
of  each  code  has  been  assigned  with  reference  to  the  ISO-codes:   code  i  is 
related  to  the  ISO  character  at  position  i_+32  (e.g.,  #33  is  'A'  and  ISO 
character  65  is  'A'). 

The  definitions  of  the  syllables  are  given  in  section  A.l.   Section  A. 2 
specifies  the  representation  of  standard  symbols  and  A. 3  summarizes  both  the 
definitions  and  the  standard  symbols. 

A.l  Syllables  and  their  functions 

a.  Syllables  that  represent  standard  symbols  directly: 

#0       space       #1  I  #8    ( 

#9    )  #12  ,  #26   : 

#27   ;  #29  := 

b.  Syllables  that  represent  digits  and  letters: 

#16   0  #33    A 

•  •  •  •  •  • 

#25   9  #58   z 

c.  Syllables  that  represent  unassigned  letters: 

#32,  #59,  #60,  #61,  #62,  #63 
For  each  of  these,  a  decoder  may  substitute  letters  not  used  for  codes  #33 
through  #58  or  some  carefully  chosen  string. 

d.  Syllables  with  assigned  functions : 

(Parenthesized  terms  indicate  that  succeeding  input  codes  serve 
as  arguments  to  the  function  syllable. ) 
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#ll+  -  consecutive  spaces  (number  of  spaces) 

The  argument  syllable  is  interpreted  as  a  number  of  spaces  to  be 
inserted  in  the  output  text. 
#2  -  token  space 

This  syllable  encodes  a  space  occurring  within  a  tag  or  number. 
It  must  be  the  first  syllable  in  any  string  of  typographical  display 
features  (codes  #lh,    #15*  #5)  in  a  tag  or  number. 

#15  -  newline  (number  of  spaces) 

A  new  line  of  output  text  is  begun  and  started  with  the  indicated 
number  of  spaces. 
#5  -  new  page  (number  of  syllables) 

A  new  page  of  output  is  begun  and  its  first  line  initially  contains 
the  specified  number  of  spaces.   (Decoders  may  treat  this  function  as 
identical  to  #15  (new  line). ) 
#28  -  upper  case 

Succeeding  output  letters  are  upper  case.   Case  does  not  affect 
the  graphic  representation  of  digits  and  special  characters.   Decoders 
may  ignore  case  shifts  if  the  output  representation  has  only  one  case  or  uses 
case  for  stropping.   To  preserve  the  appearance  of  the  text,  encoders  for 
representations  that  strop  with  case  should  shift  to  bold  case  and  the  strop 
case  simultaneously;  but  case  shift  is  never  to  be  interpreted  by  decoders 
as  an  indication  of  stropping. 
#30  -  lower  case 

Shift  output  letters  to  lower  case.   See  the  remarks  for  #28 
(upper  case). 

#11  -  boldface 

Succeeding  letters  and  any  digits  following  letters  are  output  as 
bold  face  (and  stropped  as  necessary).   Type  face  has  no  effect  on  special 
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characters  and  digits  not  following  letters.   The  boldface  shift  may  be 
omitted  between  two  bold  tags  if  they  are  separated  by  spaces  or  special 
characters.   Bold  shift  may  be  inserted  to  separate  two  bold  tags  which  in 
the  original  were  separated  only  by  stropping  characters.   See  the  discussion 
in  Section  IV. 1. 
#13  -  roman  face 

Succeeding  characters  are  to  be  roman.   This  code  may  be  omitted 
between  two  roman  tags  if  they  are  separated  by  special  characters.   See  the 
discussion  in  Section  IV. 1. 
#7  -  standard  symbol  (symbol  code) 

The  next  code  byte  selects  a  standard  symbol  (according  to 
tables  A. 2  and  A. 3).   For  example,  a  quote-symbol  is  #7,  #2,  quote-image- 
symbol  is  #7 ,   #7 ,    and  a  visible-space  is  #7,  #63.  #7  (standard  symbol)  may 
also  select  any  of  the  codes  noted  in  section  A.l.a  which  represent  standard 
symbols  directly  (thus  #7,  #8  could  be  used  for  open- symbol  instead  of  #8 
alone ) . 

The  standard  symbols  represent  functions  rather  than  symbols  in 
Chapter  9  of  the  Report.   Thus  there  are,  for  example,  three  standard  symbols 
corresponding  to  the  Report's  tilde- symbol:  #52  (tilde- symbol),  #i+7  (skip- 
operator),  and  #^3  (negation-operator).   Moreover  certain  operators  are 
represented,  though  they  appear  only  in  Chapter  10:   for  example  #^5 
(modulo-operator).  Where  a  symbol  only  functions  as  a  single  operator, 
only  one  standard  symbol  is  provided:   e.g.,  #36  (down-symbol)  for  the 
shift -right -operator. 

#3  -  pragment  mark  (pragment  delimiter,  pragment  mark,  pragment, 
pragment  mark) 

Because  there  are  many  pragment  delimiters,  the  job  of  the  decoder 

is  facilitated  if  there  is  a  unique  code  to  signal  the  bounds  of  a 
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pragment.   The  original  pragment  delimiter  must  be  preserved,  so  it  is 

encoded  between  two  pragment  marks  and  they  are  followed  by  the  pragment 

and  a  third  pragment  mark.   There  is  a  net  reduction  of  text  if  the 

delimiter  is  comment  or  pragmat,  and  in  other  cases  the  encoded  text  can 

be  shortened  with  judicious  use  of  #10  (define). 

#31  -  other  monad  (monad  number) 

Various  character  sets  support  various  other  monads.   If  an  encoder 

is  written  for  a  character  set  with  N  other  monads,  they  will  be  assigned 

monad  numbers  one  to  N.   A  decoder  can  represent  these  with  other  monads 

available  in  the  target  character  set  or  with  a  transliterated  bold  tag. 

Other  monads  may  also  appear  in  strings  and  pragments;  here  they  represent 

some  symbol  an  explanation  of  which  should  accompany  the  encoded  text  when 

it  is  transported. 

#U  -  U096  monads  (first  digit  of  monad  number,  second  digit 
of  monad  number) 

Some  installations  will  have  more  than  6U  other  monads.   The  excess 
may  be  encoded  with  this  code,  where  the  digits  of  the  monad  number  are 
interpreted  in  base  6k.      The  remarks  with  #31  (other  monad)  apply. 
#6  -  other  letter  (letter  number) 

If  the  six  codes  allocated  for  other  letters  are  not  enough,  Gh   more 
letters  can  be  represented  with  this  code.   The  decoder  may  replace  each 
such  letter  with  any  suitable  transliteration. 
#10  -  define  (code  defined,  definition  length,  definition  string) 

To  further  shorten  the  encoded  text,  single  codes  may  be  assigned  a 
string  of  codes  as  their  meaning.   The  codes  listed  in  sections  A.l.a  and 
A.l.c  above  are  readily  available  for  reassignment,  but  other  codes  can  be 
used.   If  a  code  is  assigned  a  definition  string  of  length  zero,  its 
original  interpretation  is  restored.   For  example,  a  comment  delimited  by 
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hash  marks  would  be  encoded  as  "#3,  #7 ',   #3,   #3,    text  of  comment,  #3.  " 
This  could  be  shortened  if  the  unas signed  code  #3?  were  defined  to  have  the 
value  #3,    #7 ' ,   #3,    #3.   The  definition  would  be  established  by  the  sequence 
#10,  #32,  #U,  #3,  #7,  #3,  #3. 

A.  2  Representations  of  standard  symbols 


code 

representat 

ions 

hardware 

symbol 

reference 

notes 

11+ 

point  symbol 

, 

# 

37 

times  ten  to  the  power 

symbol   10 

e 

60 

times  ten  to  the  power 

alter.   \ 

(e) 

2 

quote  symbol 

1! 

it 

7 

quote  image  symbol 

If  IT 

it  it 

0 

space  symbol 

space 

space 

7 

63 

space  alternate 

• 

(space) 

5^ 

or  symbol 

V 

or 

1+ 

•6 

and  symbol 

/\ 

& 

1+ 

33 

ampersand  symbol 

& 

(am) 

1+ 

58 

differs  from  symbol 

t 

/=" 

1+ 

28 

is  less  than  symbol 

< 

< 

5 

6 

1+1+ 

is  at  most  symbol 

4 

<= 

1+ 

39 

is  at  least  symbol 

> 

>= 

1+ 

30 

is  greater  than  symbol 

> 

> 

5 

6 

15 

divided  by  symbol 

/ 

L 

5 

6 

5 

over  symbol 

• 

i 

1+ 

^ 

modulo  operator 

fX 

1o* 

8 

25 

percent  symbol 

i 

(es.) 

1+ 

3h 

window  symbol 

0 

(elem) 

1+ 

22 

floor  symbol 

L 

(fl) 

1+ 

21 

entier  operator 

L 

(entier) 

8 

20 

lower  bound  operator 

L 

(lwb) 

8 

23 

ceiling  symbol 

r 

(upb) 

1+ 

1+1 

plus  i  times  symbol 

1 

+* 

1+ 

1+6 

not  symbol 

-1 

s   or  ~ 

1+ 

52 

tilde  symbol 

r+j 

(tl) 

1+ 

hi 

skip  operator 

rsj 

(skip) 

8 

h3 

negation  operator 

f*j 

(ng) 

8 

36 

down  symbol 

I 

(shr) 

1+ 

53 

up  symbol 

t 

(he) 

1+ 

50 

raised  to  operator 

t 

(**) 

8 

1  This  code  must  follow  the  function  code  #7  (standard  symbol)  to  invoke  the 
symbol  or  operator  shown  in  the  second  column. 

2  See  Hardware  Representation  (2).   Parentheses  around  a  graphic  indicate  the 
encoder  (as  discussed  in  section  IV. 6)  will  never  output  the  corresponding 
code,  but  in  response  to  the  code  the  decoder  in  Appendix  B  will  produce 
the  graphic. 

More  notes  on  next  page, 
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notes 


35 

shift  left  operator 

r 

(shl) 

8 

kQ 

power  operator 

X-* 

-*"* 

8 

11 

plus  symbol 

f 

+ 

k 

13 

minus  symbol 

- 

k 

16 

equals  symbol 

= 

5 

6 

10 

times  symbol 

X 

•* 

5 

6 

51 

asterisk  symbol 

x- 

(*) 

5 

6 

17 

assigns  to  symbol 

= : 

=  : 

6 

29 

becomes  symbol 

:  = 

:  = 

6 

7 

12 

comma  symbol 

i 

> 

7 

27 

semicolon  symbol 

5 

5 

7 

26 

colon  symbol 

: 

: 

7 

8 

open  symbol 

( 

( 

7 

9 

close  symbol 

) 

7 

1 

stick  symbol 

? 

7 

55 

again  symbol 

: 

9  • 

59 

brief  sub  symbol 

(b 

61 

brief  bus  symbol 

i 

0) 

32 

at  symbol                     @ 

@ 

57 

is  symbol 

:= : 

56 

is  not  symbol 

£ 

:/=: 

62 

nil  symbol 

3 

(nil) 

38 

of  symbol                     -c 

-< 

k 

formatter  symbol              $ 

$ 

31 

brief  comment  symbol           f. 

i 

CO 

3 

3 

style  ii  comment  symbol        ? 

\ 

r 

3  In  ISO/ASCII,  co  maps  to  co  and  {. . . )  maps  to  ^  ...  ji. 

k     This  standard  symbol  is  a  monad  defined  in  the  Report. 

5  Nomad  defined  in  the  Report. 

6  Because  in  a  TAO  this  code  may  follow  a  transliterated  monad,  the 
following  transliterations  are  sometimes  used 

-   code  28  30  15  16  10  51  17  29 

symbol  <   >   /   =   X   *   =:   ;= 

transliteration   LT  GT  DV  EQ  TM  ST  TO  AB 

7  The  code  at  the  left  is  a  function  code  that  will  invoke  this  standard 
symbol  directly.   (E.g.,  #29  alone  may  be  used  instead  of  #7>  #29  to 
encode  the  becomes- symbol. ) 


8  This  symbol  is  not  a  monad  and  is  so  treated  by  the  decoder  in  Appendix  B, 


A. 3  Summary  of  codes  and  standard  symbols 
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code   ISO 


function 


std.  symbol 


code  ISO  function  std.  symbol 


0 

space 

space 

space 

32 

@ 

@ 

1 

i 

33 

A 

A 

& 

2 

n 

token 

space 

n 

3^ 

B 

B 

a 

3 

# 

pragm* 

snt 

# 

35 

C 

C 

t (shl) 

k 

$ 

U096  monads 

$ 

36 

D 

D 

I 

5 

i 

new  page 

• 
• 

37 

E 

E 

10 

6 

& 

other 

letter 

•\ 

38 

F 

F 

-c 

7 

• 

std.  : 

symbol 

IT  IT 

39 

G 

G 

^ 

8 

( 

( 

( 

ko 

H 

H 

9 

) 

) 

) 

la 

I 

I 

1 

10 

* 

define 

X 

k2 

J 

J 

11 

+ 

bold 

+ 

h3 

K 

K 

~(neg) 

12 

> 

> 

) 

kk 

L 

L 

£ 

13 

- 

roman 

- 

h5 

M 

M 

;x 

Ik 

• 

spaces 
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k6 
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N 

-1 

15 

l 

new line 

/ 

hi 
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0 

~ ( skip ) 

16 

0 

0 

- 

kS 

P 

P 

■**- 

17 

1 

1 

* 

h9 

Q 

Q 

18 

2 

2 

50 

R 

R 

t (pow) 

19 

3 

3 

51 

S 

S 

* 

20 

k 

k 

L(lwb) 

52 

T 

T 

/Nrf 

21 

5 

5 

|_  (entier) 

53 

U 

U 

t 

22 

6 

6 

L 

5^ 

V 

V 

ss 

23 
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7 

r 

55 

w 

W 

1  : 

2k 

8 

8 

56 

X 

X 

-4- 

25 

9 

9 

i 

57 

Y 

Y 

26 

: 

: 

'. 

58 

Z 

Z 

'7 

27 

5 

> 

> 

59 

[ 

[ 

28 

< 

upper 

case 

< 

60 

\ 

\ 

29 

= 

:  = 

:  = 

61 

] 

i 

30 

> 

lower 

case 

> 

62 

'N 

0 

31 

9 

other 

monad 

£ 

63 

« 

code 


ISO 


function 


std. symbol 


value  of  six-bit  code  byte  to  invoke  this  function  and 
standard  symbol. 

graphic  in  position  code+32  in  the  ISO  code. 

the  action  or  graphic  selected  by  this  code  as  a  function. 

standard  symbol  selected  if  this  code  appears  as  an 
operand  to  code  7. 
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Appendix  B.   Decoder  from  the  Transportation  Representation  to  a 
Hardware  Representation 

B.l  The  hardware  representation 

The  decoder  is  written  in--and  produces  text  in— the  ISO/ASCIl/EBCDIC 
hardware  representation  defined  in  (2).   The  principle  differences  from  the 
reference  language  are  '-<•  for  of,  '?'  for  then  and  else,  and  in  or  out 
for  ' | '  meaning  in  or  out. 

With  three  exceptions,  bold  tags  defined  in  the  report  are  reserved 
as  recommended  in  (2).  User  defined  bold  tags  and  i,  im,  and  re  must  be 
marked  by  a  final  apostrophe.   Case  is  optional  for  all  tokens,  but  in 
writing  the  program  below  I  exercised  this  option  to  make  bold  tags  upper 
case.   (The  program  output  depends  on  the  case  specified  by  its  input. ) 

Tags  and  numbers  may  contain  typographical  display  features  (R9.Ud), 
but  each  string  of  them  in  a  tag  must  contain  at  least  one  underline  ("_"). 

As  a  concession  to  this  decoder,  the  compiler  allows  the  following 
diphthongs  as  monads:   +*,  >=,  <=,  and  /=.   (+*  is  the  representation  of 
the  plus-i-times-symbol. )  This  provision  allows  operators  like  >=<  and 
/=/,  but  no  ambiguity  arises.   Indeed,  no  ambiguity  would  arise  if  any 
number  of  nomads  were  allowed  in  operators,  instead  of  at  most  two. 
B.2  Description  of  the  decoder 

The  input  is  on  the  file  'CODE_IN',  a  stream  of  six  bits  code 
syllables  as  defined  in  Appendix  A.   The  output  is  a  "reconstituted" 
ALGOL  68  program  on  the  file  'RA68_OUT';  a  listing  is  also  produced  on 
'STAM)_OUT\ 

In  general,  the  main  routine  of  the  decoder  reads  a  code  from 
the  input  and  uses  its  value  to  select  and  execute  an  element  of  the 
'row-proc-void'  CODE_FUNCTION.   The  element  is  usually  one  of  the 
procedures  whose  name  begins  with  'PUT_'.   These  in  turn  append  characters 
to  the  output  stream  with  the  operator  '+/:='. 
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Because  typographical  display  features  are  illegal  in  bold  tags 
and  because  string  continuations  may  not  be  indented,  it  is  necessary 
for  the  decoder  to  keep  track  of  what  type  of  token  it  is  processing.   At 
each  point  that  might  be  the  start  of  a  token,  CHANGE_STATE  is  called. 
If  necessary,  it  terminates  the  old  token  and  starts  the  next;  it  may 
introduce  blanks  or  stropping  characters.   These  functions  are  controlled 
by  an  automaton  with  nine  states  and  nine  inputs  as  shown  in  Figure  B.l. 
Before  exiting,  CHANGE_STATE  usually  calls  SET_BREAK  to  store  the  location 
of  the  beginning  of  the  new  token.   If  the  old  token  extends  beyond  the 
maximum  line  length,  SET_BREAK  calls  BREAK_LIWE  to  interrupt  the  line. 
In  most  cases,  the  entire  old  token  is  moved  to  the  new  line  and  the  new 
line  is  indented  seven  spaces  more  than  the  last  indentation  specified 
with  a  new line  code. 

Certain  inputs  are  ignored: 

-  newlines  in  strings, 

-  all  spaces  in  a  tag  following  a  'token  space'  code, 

-  spaces  following  a  point  where  a  line  was  broken 

(if  the  break  was  due  to  spaces  on  the  previous  line). 

Errors  are  treated  by  just  counting  them.   In  a  production  decoder, 
they  would  be  indicated  appropriately  on  the  STAND_OUT  listing.   I  wrote 
the  code  to  do  this,  but  it  added  bulk  without  light  and  was  discarded 
from  the  current  version.   The  following  errors  are  detected: 

A  'token  space'  code  used  outside  a  tag  or  number. 

A  'token  space'  code  in  a  tag  but  not  followed  by  a 
continuation  of  the  tag. 

A  'token  space'  code  in  a  number,  but  not  followed  by  a 
continuation  of  the  number. 

An  illegal  operand  for  ' standard  code ' . 

A  TAO  that  has  too  many  nomads. 

A  'visible  space'  occurring  outside  a  string. 

An  appropriate  action  is  taken  in  each  case,  and  processing  continues. 
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B.3  The  Decoder 

BEGIN  COMMENT  Decoder  for  Transportation  Representation  COMMENT 

#The  perm_£ode„function  array  is  the  permanent  list  of 
procedures  to  "be  executed  for  each  of  the  6h   input  codes. 
To  provide  for  definitions,  the  permanent  array  is  copied 
into  the  code-function  array,  which  is  modified  when  a 
definition  occurs.   The  main  program  will  be 

WHILE  decoding  DO  code- function (getch)  OD 

where  getch  gets  the  next  code  from  the  input  file,  code_in.# 

(0:63)  PROC  VOID  perm^code.jfunction,  code_function; 

FOR  i  FROM  l6  TO  25  DO 

perm_code_function  (i)  :=  put_dig  OD; 

FOR  i  FROM  32  TO  63  DO 

perm_code_function  (i)  :=  VOID:  put_let  (curr_code)  OD; 

perm  code  function  (0:15)  :=  (0:15)  PROC  VOID 


(#0  space  # 

VOID 

:  put  spaces (l), 

#1  '  |  '  # 

VOID 

:  put  std(l), 

#2  token  space  # 

VOID 

put  toksp, 

#3  pragment  # 

VOID 

:  put  pragment, 

#h   I+096  monads  # 

VOID 

put  bold  monad 

( "Q"+whole  (6U*getch+getch+61f,  0) ), 

#5  new  page  # 

VOID 

put  line (getch), 

#6  other  letters  # 

.  VOID 

put  let(getch+6U), 

#7  std.  symbol  # 

VOID 

put  std  (getch), 

#8  »('  # 

VOID 

put  std(8), 

#9  ')*  # 

VOID 

put_std(9), 

#10  define  # 

VOID 

set  define, 

#11  boldface  # 

VOID 

(curr  face  :=  bold  face;  tagstart  :=  TRUE), 

#12  ',  »  # 

VOID: 

put_std(l2), 

#13  romanface  # 

VOID 

(curr  face  :=  roman  face;  tagstart  :=  TRUE), 

#lU  spaces  # 

VOID 

put  spaces (getch), 

#15  newline  # 

VOID 

put  line(getch)  ); 
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perm_code_f unction  (26:31)  :=  (26:31)  PROC  VOID 


(#26  • :  •  # 

VOID 

put  std(26), 

#27  »;'  # 

VOID 

put_std(27), 

#28  upper  case  # 

VOID 

curr  case  :-  upper  case, 

#29  ':='  # 

VOID 

put_std(29), 

#30  lower  case  # 

VOID 

curr  case  :=  lower  case, 

#31  other  monads 

# 

VOID 

put  bold  monad 

("Q"  +  whole (getch,  oj)); 
code_function  :=  perm_code_function; 

#  Line  breaks  and  token  delimitation  are  controlled  by  a  finite 
state  automaton  which  is  activated  at  places  of  possible  start 
of  token.   The  states  and  input  symbols  have  these  mnemonics :# 

INT  out_st  =  1,   tao_st  =  2,   bold_st  =  3,   tag_st  =  k, 
tagsp_st  =  5,   num_st  =  6,   numsp_st  =  7, 
pment_st  =  8,   str_st  =  9; 

INT  space_in  =  1,   toksp_in  =  2,   std_in  =  3j 

tao_in  =  h,      pment_in  =  5*   rom_in  =  6,   bold_in  =  7, 
dig_in  =8,   quot_in  =  9 5 

#  These  flags  also  maintain  state  information  # 

BOOL  unspaced  :=  TRUE,  #  TRUE  except  during  that  part  of  a  tag  that 

follows  a  space  in  the  tag  (which  is  replaced  by  an  underbar)  # 

tagstart  :=  TRUE,  #  set  TRUE  whenever  the  next  digit  or  letter 
could  start  a  token  # 

skipping_spaces  :=  false;  #  TRUE  after  underbar  inserted  in  tag 
or  sometimes  after  breaking  a  line.   Causes  spaces  to  be 
ignored.  # 

INT  curr_state  :=  out_st;  #  remember  what  state  we  are  in  # 

PROC  change_state  =  (INT  input)  VOID: 

(INT  next_state  :=  (input  IN  out_st,  out_st,  out_st, 
tao_st,  pment_st,  tag_st,  bold_st,  num_st,  str_st); 

tagstart  :=  input  <=  pment_in; 

CASE  curr_state  IN 

#1  out_st  #     IF  input  /=  toksp_in  THEN  set_break; 

IF  input  /=  space_in  THEN 

skipping  spaces  :=  FALSE  FI  FI, 
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#2  tao_st  #    (tao_cnt  :=  0; 

IF  input  =  std_in  OR  input  =  tao_in 
THEN  set_break  Fl), 
#3  bold_st  #   IF  input  /=  dig_in  THEN 

IF  NOT  is  reserved  (last_tok_out)  THEN  +/:=  '"" 
ELIF  (input  =  bold_in  OR  input  =  rom_in) 

&  NOT  line_has_left(0)  THEN  +/:=  "  "  FI; 
set_break;    tao_cnt  :=  0   FI, 
#k   tag_st  # 

IF  input  /=  rom_in  AND  input  /=  dig_in  THEN 
IF  input  =.  toksp_in  THEN 

next_state  :=  tagsp_st; 
skipping_spaces  :=   TRUE; 
unspaced  :=  FALSE 
ELSE  IF  unspaced  &  i s_re served ( las t_tok_out) 
THEN  +/:  =  "x"  FI; 
unspaced  :=  TRUE; 
IF  input  =  bold_in 

&  NOT  line_has_left(0) 
THEN  +/:=■■    "  "  FI 
FI; 

set_break  FI, 
#5  tagsp_st  # 

IF  input  =  space_in  THEN  next_state  :=  tagsp_st 
ELIF  input  =  rom_in  OR  input  =  dig_in  THEN 
skipping  spaces  :=  FALSE; 
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next_state    :=  tag_st; 

ELIF  input  =  toksp_in  THEN 
next_state    :=  tagsp_st; 
set_break 
ELSE  error   ("Tag  space  not  followed  by  more  tag."); 
unspaced    :=  TRUE;  skipping- spaces    :=  FALSE; 

set_break       FI, 
#6  num_st  #  IF  input  /=  dig_in  THEN 

IF  input  =  rom_in  &  curr_code  =  50  #  an   "r"  #  THEN 

next_state    :=  num_st 
ELSE  IF   (input   =  rom_in  OR  input  =  bold_in 
OR   (input  =  toksp_in 

?   next_state    :=  numsp_st;        TRUE 
?    FALSE)) 
&  NOT  line_has_left(0) 

THEN  +/:=   "    "  US 
set_break 
FI  FI  , 
#7  numsp_st  # 

IF  input  =   space_in  THEN 

next_state    :=  numsp_st 
ELIF  input  =  toksp_in  THEN 

next_state    :=  numsp_st;    set_break 
ELIF  input  -  dig_in  THEN  set_break 
ELIF  input  =  rom_in  8=  curr_code  =  50  #  an   "r"  # 
THEN  next   state    :=  num  st 
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ELSE  error  ("Space  in  number  not  followed  by  number."); 
set_break  FI, 
#8  pment_st  #    (set_break; 

next_state  :=  (input  =  pment_in  ?  out_st  ?  pment_st); 
IF  input  -  quot_in  THEN  tagstart  :=  TRUE  Fl), 
#9  str_st  # 

IF  input  =  quot_in  THEN 

tagstart  :=  TRUE;    next_state  :=  out_st 
ELSE  tagstart  :=  FALSE;   next_state  :=  str_st  FI 
ESAC; 
curr_state  :=  next_state);  #  end  of  change_state  # 

#  The  following  Tput_'  routines  are  called  on  in  response  to 
specific  codes  from  the  input  stream  # 

PROC  put_dig  =  VOID: 

(IF  tagstart  THEN  change_state  (dig_in)  FI; 

+/:=  "0123i456789"(curr_code  -  16  #  a  "0"  #  +  l)); 

INT  roman_face  :=rom__in,  bold_face  :  =  bold_in, 

upper_case  =  2,    lower_case  =  1; 

INT  curr_face  :=  roman_face,  curr_case  :-  lower_case; 

PROC  put_let  =  (INT  let)  VOID: 

(IF  tagstart  OR  curr_state  =  num_st 

THEN  change_state  (curr_face)  FI; 

+/:=  IF  let  >  63  THEN  "Z"  +  whole  ( let,  0) 

ELIF  let  >=   59  THEN  "Z"  f  "aeiou"(let-58) 

ELIF  let  =  32  THEN  "Zy" 

ELIF  curr_case  =  upper_case  THEN 

"ABCDEFGHIJKIMN0PQRSTUVWXYZ"(let-32 ) 

ELSE  "abcdefghijkImnopqrstuvwxyz"(let-32)   Fl); 
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PROC  put_toksp  =  VOID: 

(IF  curr_state  <=  bold_st  OR  curr_state  >=  pment_st  THEN 
error  ("Token  space  outside  tag  or  number.")  FI; 

change_state  ( toksp_in ) ) ; 
PROC  put_spaces  =  (INT  nsp)  VOID: 

(change_state ( space_in ) ; 

IF  NOT  skipping_spaces  THEN  +/:=  nsp*"  "  Fl)j 
PROC  put_line  =  (INT  nsp)  VOID: 

IF  curr_state  /=  str_st  THEN 

chang e_st ate ( space_in ) ; 

IF  outline  /=  brind  THEN  #  there  is  text  # 

output_str ing  ( outline )  FI ; 
breakpoint  :=  nsp+1; 

brind  :=  outline  :=  indent  :=  nsp*-"  "  FI; 
BOOL  pragbody  :=  FALSE;  #TRUE  while  outputing  pment  body  # 
STRING  pragdel;  #  save  delimiter  for  end  of  pment  # 
PROC  put_pragment  =  VOID: 

IF  curr_state  /=  pment_st  THEN 

change_state  (pment_in);  pragbody  :=  FALSE 
ELIF  NOT  pragbody  THEN 

pragbody  :=  TRUE;  pragdel  :-   last_tok_out 
ELSE  +/:=  pragdel;   change_state(pment_in)  FI; 

PROC  put_std  =  (INT  stdno)  VOID:  #  one  of  the  standard  symbols  # 

CASE  std_type( stdno)  IN 
#1  monad  #        put_monad ( std_r epr ( stdno ) ) , 
#2  nomad  #        put  nomad (stdno), 
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#6  error  # 

#7  other  # 

#8  boldother  # 

#9  boldmonad  # 


#3  : =, = :  put_asgn ( stdno ) , 

#i+  space  #        put_spaces(l), 

#5  "#  ( change_state  ( quot_in ) ; 

skipping_spaces  :=  (curr_state  =  str_st); 

( error  ( '^Unknown  standard  code .  "  )  j 

change_state(std_in) ;  +/:=  std_repr( stdno)), 
(change_state(std_in);   +/:  =  std_repr( stdno ) ), 
(change_state(bold_in);  +/:=  std_repr ( stdno ) ; 

tagstart  :=  TRUE), 
put_bold_monad ( std_repr ( stdno ) ) , 
#10  visible  space  #  IF  curr_state  =  str_st  THEN  +/:=:  "  " 

ELSE  error ("Visible  space  outside  string."); 

put_spaces(l)  FI  ESAC:  #end  of  put_std  # 
INT  tao_cnt  :=  0;    #  keeps  track  of  how  much  tao  has  been  put  # 
BOOL  trans_tao  :  =  FALSE;  #  TRUE  if  had  to  transliterate  monad  # 
PROC  put_monad  =  (STRING  mon_repr)  VOID: 

(change_state  (tao_in);   tao_cnt  :=  1; 
trans_tao  :=  FALSE;   +/:=  mon_repr); 
PROC  put_bold_monad  =  (STRING  mon_repr)  VOID: 
.( chang e_s tat e(bold_in);   tao_cnt  :=  1; 
trans_tao  :=  TRUE;   +/:=  mon_repr); 
PROC  put_nomad  =  (INT  nomad_num)  VOID: 
(CASE  tao_cnt  +  1  IN 

#  0  #  (change_state(tao_in); 

tao_cnt  :=  1;   trans_tao  :=  FALSE), 

#  1  #  tao_cnt  :=  2, 
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#  2  #  error ("TAO  too  long.")   ESAC; 
+/:=  (NOT  trans_tao  ?  std_repr(nomad_num) 

?  INT  x;   char_in_string(REPR  nomad_num,  x,  nomad_codes ) ; 
"TMDVEQLTGTST"(2*x-l  :  2*x)))j 
PROC  put_asgn  =  (INT  asgnjium)  VOID: 

(IF  tao_cnt  =  0  THEN  change_state(tao_in) ; 

trans_tao  :=  FALSE  FI ; 
+/:=  IF  tao_cnt  >  0  &  trans_tao  THEN 

(asgnjium  =  17  ?  "TO"  ?  "AB") 
ELSE  std_repr(asgn_num)  FI; 
tao  cnt  : =  0 ) ; 


(0:63)  STRING  std_repr;  #  usual  text  for  each  standard  code  # 
(0:63)  INT  std_type;     #  for  CASE  in  put_std  # 
BEGIN  #  this  block  is  a  documentary  device.   It  is  used  to 
introduce  'tT  so  as  to  list  in  parallel  the 
initial  values  for  std_repr  and  std_type  # 
INT  monad  =  1,  nomad  =  2,  asgn  =  3,  space  =  h, 

quote  =  5j  error  =  6,  other  =  7 ,   bold_other  =  8, 
bold_monad  =  9*  visible_space  =  10; 
(0:63)  STRUCT  (STRING  std  repr,  INT  std  type)  t  = 


((#  0  space  # 

(#  1  stick  # 

(#  2  quote  # 

(#  3  hash  '  # 

(#  k   dollar  # 

(#  5  over  # 

(#  6  and  # 


"#", 


V, 


space),, 
other), 
quote), 
other), 
other), 
monad ) , 
monad), 
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ItlMM!  tt  It 
) 


tt  ,  tt 


tt   tt 


(#  7  q  image  # 

(#8l  paren  # 

(#  9  r  paren  # 

(#10  times   # 

(#11  plus    # 

(#12  comma   # 

(#13  minus   # 

(#lk   point   # 

(#15  slash   #     "/", 

(#16  equal   # 

(#17  assigns  # 

(#18        # 

(#19  # 
(#20  flr(lwb)# 
(#21  flr(ent)# 
(#22  floor  # 
(#23  ceiling  # 
(#21+  # 
(#25  percent  # 
(#26  colon  # 
(#27  semicol  # 
(#28  lessthan# 
(#29  becomes  # 
(#30  greater  # 
(#31  cent  # 
(#32  at  # 
(#33  amper   # 


11  ERR18'  ", 
"  ERE19'  ", 

"ENTIER", 

"FL", 
"UPB", 
"  ERE2V    ", 

"PC", 


tt .  tt 
5    } 


'<», 


tt-^tt 


"CO", 

"@", 

n  /\n/r" 


other), 

other), 

other), 

nomad), 

nomad), 

other), 

monad), 

other), 

nomad ) , 

nomad), 

asgn), 

error), 

error), 

bold_other), 

bold_other), 

bold_monad), 

bold_monad), 

error), 

bold_monad), 

other), 

other), 

nomad), 

asgn), 

nomad), 

bold_other), 

other), 

bold_monad), 


3*+ 


(#3^  window  # 

(#35  up(shl)  # 

(#36  down  # 

(#37  subten  # 

(#38  of  # 

(#39  grt   eq  # 

(#1+0  # 

(#1+1  plus  i    # 

(#1+2  # 

(#1+3  til (not )# 

(#1+1+  less   eq  # 

(#1+5  mod           # 

(#1+6  not           # 

{#kl  til(skp)# 

(#^8  **              # 

(#1+9  # 

(#50  up(pow)  # 
(#51  asterisk# 
(#52  tilde  # 
(#53  uparrow  # 
(#5^  or  # 

(#55  again  # 
(#56  isnt  # 
(#57  is  .  # 
(#58  differs  # 
(#59  sub  # 

(#60  backsls  # 


"ELEM", 
"SHL", 
"SHR", 


-<' 


"  ERRl+0'    ", 

"  ERRl+2'    ", 
"NOT", 

\ —        4 


%*", 


it    it 

1   ? 


"SKIP", 
"  ERRl+9'    ", 

It  y  M    It 

"TL", 
"UP", 

ttrvn't 


'OR", 

11  .  /  _  .  11 

'/  ~ '    > 

11 ._ .  11 


it    11 


boldjnonad ) , 
bold_other), 
boldjnonad), 

other), 

other), 

monad ) , 

error), 

monad), 

error), 

bold_other), 

monad ) , 

other), 

monad ) , 

bold_other), 

other), 

error), 

other), 

nomad), 

boldjnonad), 

boldjnonad), 

boldjnonad), 

other), 

other), 

other), 

monad ) , 

other), 

other), 
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(#61  bus     #      ")",       other), 
(#62  nil     #     "NIL",      bold_other), 
(#63  vis  spc  #      "  ",       visible_space)); 
std_repr  :=  std_repr-<t; 
std_type  :=  std_type-<t 
END;  #  block  to  initialize  std_repr  and  std_type  # 
STKENG  nomad_codes  =  REPR  10  +  REPR  15  +  REPR  16  + 

REPR  28  +  REPR  30  +  REPR  5I5 
#  times,  divide,  equal,  less,  greater,  asterisk  # 

PROC  is_re served  =  (STRING  tag)  VOID: 

(INT  reserved_limit  =  7,  #  length  of  longest  reserved  word  # 

numb e preserved  =  $8,     #  algorithm  good  to  128  # 
IF  UPB  tag  >  reserved.limit  THEN  FALSE 
ELSE  (l:number_reserved)  STRING  reserved_word  = 

("ABS",  "AND",  "ARG",  "AT",  "BEGIN",  "BITS",  "BOOL",  "BY", 

"BYTES",  "CASE",  "CHANNEL",  "CHAR",  "CO",  "COMMENT", 

"COMPL",  "CONJ",  "DIVAB",  "DO",  "DOWN",  "ELEM",  "ELIF", 

"ELSE",  "EMPTY",  "END",  "ENTIER",  "EQ",  "ESAC",  "EXIT", 

"FALSE",  "FI",  "FILE",  "FLEX",  "FOR",  "FORMAT",  "FROM", 

"GE",  "GO",  "GOTO",  "GT",  "HEAP",  #I#  "IF",  #IM#  "IN", 

"INT",  "IS",  "ISNT",  "IE",  "LENG",  "LEVEL",  "LOC",  "LONG", 

"LT",  "LWB",  "MINUSAB",  "MOD",  "MODAB",  "MODE",  "NE", 

"NIL",  "NOT",  "OD",  "ODD",  "OF",  "OP",  "OR",  "OUSE",  "OUT", 

"OVER",  "OVERAB",  "PAR",  "PLUSAB",  "PLUSTO",  "POW",  "PR", 

"PRAGMAT",'  "PRIO",  "PROC",  #RE#  "REAL",  "REF",  "REPR", 

"ROUND",  "SEMA",  "SHL",  "SHORT",  "SHORTEN",  "SHR",  "SIGN", 

"SKIP",  "STRING",  "STRUCT",  "THEN",  "TLMESAB",  "TO", 

"TRUE",  ,rUNION",  ,rUP",  "UPB",  "VOID",  "WHILE"); 
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STRING  uctag  :=  tag; 

FOR  i  TO  UPB  uctag  DO  INT  loc; 

IF  char_in_string  (uctag(i),  loc,  "abcdefghijklmnopqrstuvwxyz") 
THEN  uctag  (i)  :=  "ABCDEFGHIJKI^NOPQRSTUWXYZ"  (loc)  FI  OD; 

INT  loc  :=  6k,    incr  :=  32;   BOOL  found  :=  FALSE; 

TO  7  WHILE  NOT  found  DO 

IF  reserved_word(loc)  >  uctag  THEN  loc  -:=  incr 
ELIF  reserved_word(loc)  =  uctag  THEN  found  :=  TRUE 
ELSE  loc  +:=  incr; 

IF  loc  >  number_re served  THEN  loc  :=  numb er_re served  FI 
FI;   incr  %:=   2  OD; 

found  FI);  #  end  of  is_reserved# 

#  input  of  codes  &  definition  of  codes  (code  10)  # 
INT  curr_code;  #  filled  by  getch  # 

CHAR  input_char;  ff   used  by  getch  and  logical_file_end  # 
FILE  code_in; 

open  (code_in);  make_conv  (code_in^  complete_conv); 
on_logical_file_end  (code_in,  (REF  FILE  f)  BOOL: 

(input_char  :=  REPR  5;  #  new  page  code  # 
decoding  :  =  FALSE;    TRUE)); 

#  definitions  are  stored  in  the  following  # 

(0:63)  FLEX  (1:0)  INT  definition;  #  store  definition  # 

FLEX  (1:0)  INT  inputs;   #  copy  definition  for  reading  by  getch  # 

INT  Inptr  :=   -1;  #  pointer  into  inputs  used  by  getch  # 

PROC  set_define  =  VOID:    #  called  in  response  to  code  10  # 

IF  INT  code  =  getch;   INT  len  =  getch;  #  not  collateral  # 
len  =  0  THEN  #  restore  prescribed  meaning  # 
code_f unction  (code)  :=  perm_code_function(code); 
definition (code)  :=  () 
ELSE  #  define  the  code  to  be  the  next  'len'  codes  # 
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code_f -unction  (code)  :=  put_defn; 
definition  (code)  :=  HEAP  (l:len)  INT; 
FOR  i  TO  len  DO 

definition  (code)(i)  :=  getch  OD  FI; 

FROC  put_defn  =  VOID:    #  code_function  (defined  code)  =  put_defn  # 

(INT  ui  =  UFB  inputs  -  inptr  +  1, 

ud  =  UFB  definition  (curr_code); 

(1  :  ui+ud)  INT  tin; 

tin(l:ud)  :=  definition  (curr_code)( : ) ; 

tin(ud+l:)  :=  inputs  (inptr:); 

inputs  :=  tin;   inptr  :=  0); 
PROC  getch  =  INT:    #  called  on  to  get  a  code  from  input  # 

IF  inptr  <  0  THEN  get  (code_in,  input_char); 
curr_code  :=  ABS  input_char 

ELIF  inptr  >=  UPB  inputs  THEN  inptr  :=  -1;   getch 

ELSE  curr_code  :=  input s(inptr+:=l)  FI; 

#  output  routines  and  routines  to  access  output  string:   outline  # 

STRING  outline  :=  "",        #  line  being  built  # 

indent  :=  "",      #  blanks  put  at  start  of  line  by  new  line  # 

.  brind  :=  "",      #  blanks  put  at  start  of  most  recent  line 

(as  possibly  broken)  # 

INT  mxln  =  72,   #  maximim  nimiber  of  characters  in  line  # 

breakpoint  :=  1,   #  start  of  most  recent  token 

(a  potential  line  break  spot)  # 

linenumber  :=  0;   #  for  numbering  output  lines  # 
FILE  ra68_out;   open  (ra68_out);   #  the  reconstituted  text  # 
OP  +/:=  =   (STRING  s)  VOID:   outline  +:=  s; 
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PROC  line_has_left  =  (INT  n)  BOOL:  mxln  -  UPB  outline  >=  n; 
PROC  last_tok_out  =  STRING:   outline  (breakpoint:); 
PROC  set_break  =  VOID: 

(WHILE  UPB  outline  >  mxln  DO  break_line  OD; 
breakpoint  :=  UPB  outline  +  l); 
PROC  output_string  =  (STRING  line)  VOID: 

(STRING  tline  =  line  +  (mxln  -  UPB  line  +  l)*"  " 

+  whole  (linenumber  +:=  100,  7); 
put  (ra68_out,  tline); 
put  ( standout,  tline ) ) ; 
PROC  break_line  =  VOID :    #  terminate  input  line  longer  than  mxln  # 
IF  breakpoint  =  UPB  brind  +  1  &  brind  /=  "" 
&  curr_state  =  bold_st  THEN 
#  no  breakpoint,  unindent  line  # 
outline  :=  last_tok_out ;  breakpoint  :=  1 
ELSE  #  move  last  token  to  new  line  rather  than  split  it  # 
brind  :=  jf   indent  for  rest  of  line  # 
(UBP  indent  >  mxlnfc2  +  5 

?  (mxln#2)*"  "  ?  indent+7*"  "); 
IF  breakpoint  =  1  THEN  #  must  split  token  anyway  # 
breakpoint  :=  mxln+1; 
IF  curr_state  =  bold_st  THEN    #  truncate  bold  # 

outline  :=  outline (l:mxln) 
ELIF  curr_state  =  str_st  THEN  brind  :=  "" 
ELIF  curr_state  =.  tag_st  THEN  #  insert  "_"  # 
outline  :=  outline (l:mxln)  +  "_" 
+  outline  (mxln+1 : )   FI  FT; 
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IF  curr_state  =  out_st  THEN  skipping_spaces  :=  TRUE  FT; 
output_string ( outline ( 1 :mxln ) ) ; 
outline  :=  brind  +  last_tok_out ; 
breakpoint  :=  UPB  brind  +  1 
FI;  #  end  break_line  # 

#  errors  # 

INT  nerrs  :=  0; 

PROC  error  =  (STRING  msg)  VOID: 

nerrs  +:=  1; 

I  in  an  implementation,  the  "msg"  would  be  printed 
with  a  pointer  to  its  location  ff 

#  main  program  # 

BOOL  decoding  :=  TRUE;    #  set  FALSE  by  logical_file_end(code_in)  # 
WHILE  decoding  DO 

code_function(getch)   OD; 
put  (standout,  "There  were  "+whole (nerrs, 0)+"  errors.") 
END;#  of  decoder  # 
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